Learning in non-stationary MDPs as transfer learning

نویسندگان

M. M. Hassan Mahmud

Subramanian Ramamoorthy

چکیده

In this paper we present a learning algorithm for a particular subclass of non-stationary environments where the learner is required to interact with other agents. The behavior-policy of the agents are determined by a latent variable that changes rarely, but can modify the agent policies drastically when it does change (like traffic conditions in a driving problem). This unpredictable change in the latent variable results in non-stationarity. We frame this problem as a transfer learning in a particular subclass of MDPs where each task/MDP requires the learner to learn to interact with opponent agents with fixed policies. Across the tasks, the state and action space remains the same (and is known) but the agent-policies change. We transfer information from previous tasks to quickly infer the combined agent behavior policy in a new task after some limited initial exploration, and hence rapidly learn an optimal/nearoptimal policy. We propose a transfer algorithm which given a collection of source behavior policies, eliminates the policies that do not apply in the new task in time polynomial in the relevant parameters using novel a statistical test. We also perform experiments in three interesting domains and show that our algorithm significantly outperforms relevant algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLUSTERING MDPS FOR TRANSFER LEARNING Clustering Markov Decision Processes For Continual Transfer

We present algorithms to effectively represent a set of Markov decision processes (MDPs), whose optimal policies have already been learned, by a smaller source subset for lifelong, policy-reusebased transfer learning in reinforcement learning. This is necessary when the number of previous tasks is large and the cost of measuring similarity counteracts the benefit of transfer. The source subset ...

متن کامل

Measuring the Distance Between Finite Markov Decision Processes

Markov decision processes (MDPs) have been studied for many decades. Recent research in using transfer learning methods to solve MDPs has shown that knowledge learned from one MDP may be used to solve a similar MDP better. In this paper, we propose two metrics for measuring the distance between finite MDPs. Our metrics are based on the Hausdorff metric which measures the distance between two su...

متن کامل

Clustering Markov Decision Processes For Continual Transfer

متن کامل

The Time Adaptive Self Organizing Map for Distribution Estimation

The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...

متن کامل

On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games

The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of γdiscounted zero-sum Markov Games (MGs). As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Learning in non-stationary MDPs as transfer learning

نویسندگان

چکیده

منابع مشابه

CLUSTERING MDPS FOR TRANSFER LEARNING Clustering Markov Decision Processes For Continual Transfer

Measuring the Distance Between Finite Markov Decision Processes

Clustering Markov Decision Processes For Continual Transfer

The Time Adaptive Self Organizing Map for Distribution Estimation

On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games

عنوان ژورنال:

اشتراک گذاری